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Abstract 



The problem of network-constrained averaging is to compute the average of a set of values distributed 
throughout a graph G using an algorithm that can pass messages only along graph edges. We study this 
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■ problem in the noisy setting, in which the communication along each link is modeled by an additive 

Q 

white Gaussian noise channel. We propose a two-phase decentralized algorithm, and we use stochastic 
approximation methods in conjunction with the spectral graph theory to provide concrete (non-asymptotic) 
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On ' bounds on the mean-squared error Having found such bounds, we analyze how the number of iterations 

(N 
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O 
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TG{n;5) required to achieve mean-squared error 6 scales as a function of the graph topology and the 
number of nodes n. Previous work provided guarantees with the number of iterations scaling inversely 
with the second smallest eigenvalue of the Laplacian. This paper gives an algorithm that reduces this graph 
dependence to the graph diameter, which is the best scaling possible. 



H ■ I. Introduction 

■ 

The problem of network-constrained averaging is to compute the average of a set of numbers distributed 
throughout a network, using an algorithm that is allowed to pass messages only along edges of the graph. 
Motivating applications include sensor networks, in which individual motes have limited memory and 
communication ability, and massive databases and server farms, in which memory constraints preclude 
storing all data at a central location. In typical applications, the average might represent a statistical estimate 
of some physical quantity (e.g., temperature, pressure etc.), or an intermediate quantity in a more complex 
algorithm (e.g., for distributed optimization). There is now an extensive literature on network-averaging, 
consensus problems, as well as distributed optimization and estimation (e.g., see the papers ||7l, llT2l . lITOl . 
Il30ll . EOl . iH, H, m, im, lH). The bulk of the earlier work has focused on the noiseless variant, in 



which communication between nodes in the graph is assumed to be noiseless. A more recent line of work 
has studied versions of the problem with noisy communication links (e.g., see the papers lITSl . lITSl . ETl . 
ia, 1291, mi, 1241 and references therein). 

The focus of this paper is a noisy version of network-constrained averaging in which inter-node com- 
munication is modeled by an additive white Gaussian noise (AWGN) channel. Given this randomness, any 
algorithm is necessarily stochastic, and the corresponding sequence of random variables can be analyzed in 
various ways. The simplest question to ask is whether the algorithm is consistent — that is, does it compute 
an approximate average or achieve consensus in an asymptotic sense for a given fixed graph? A more refined 
analysis seeks to provide information about this convergence rate. In this paper, we do so by posing the 
following question: for a given algorithm, how does number of iterations required to compute the average to 
within (5-accuracy scale as a function of the graph topology and number of nodes n? For obvious reasons, 
we refer to this as the network scaling of an algorithm, and we are interested in finding an algorithm that 
has near-optimal scaling law. 

The issue of network scaling has been studied by a number of authors in the noiseless setting, in which the 
communication between nodes is perfect. Of particular relevance here is the work of Benezit et al. fSl, who 
in the case of perfect communication, provided a scheme that has essentially optimal message scaling law 
for random geometric graphs. A portion of the method proposed in this paper is inspired by their scheme, 
albeit with suitable extensions to multiple paths that are essential in the noisy setting. The issue of network 
scaling has also been studied in the noisy setting; in particular, past work by Rajagopal and Wainwright [27] 
analyzed a damped version of the usual consensus updates, and provided scalings of the iteration number 
as a function of the graph topology and size. However, our new algorithm has much better scaling than the 
method ETl . 

The main contributions of this paper are the development of a novel two-phase algorithm for network- 
constrained averaging with noise, and establishing the near-optimality of its network scaling. At a high 
level, the outer phase of our algorithm produces a sequence of iterates {0(t)}^q based on a recursive 
linear update with decaying step size, as in stochastic approximation methods. The system matrix in this 
update is a time-varying and random quantity, whose structure is determined by the updates within the inner 
phase. These inner rounds are based on establishing multiple paths between pairs of nodes, and averaging 
along them simultaneously. By combining a careful analysis of the spectral properties of this random matrix 
with stochastic approximation theory, we prove that this two-phase algorithm computes a (5-accurate version 
of the average using a number of iterations that grows with the graph diameter (up to logarithmic factors)]^ As 

'The graph diameter is the minimal number of edges needed to connect any two pairs of nodes in the graph. 
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we discuss in more detail following the statement of our main result, this result is optimal up to logarithmic 
factors, meaning that no algorithm can be substantially better in terms of network scaling. 

The remainder of this paper is organized as follows. We begin in Section |ll] with background and 
formulation of the problem. In Section JIIJ we describe our algorithm, and state various theoretical guarantees 
on its performance. We then provide the proof of our main result in Section |IVl Section |V] is devoted to 
some simulation results that confirm the sharpness of our theoretical predictions. We conclude the paper in 
Section |Vll 

Notation: For the reader's convenience, we collect here some notation used throughout the paper. The 
notation /(n) = 0{g{n)) means that there exists some constant c G (0, oo) and no G N such /(n) < cg{n) 
for all n > no, whereas /(n) = Q{g(n)) means that /(n) > c'g{n) for all n > no- The notation /(n) = 
Q{g{n)) means that /(n) = 0{g{n)) and /(n) = Q.{g{n)). Given a symmetric matrix A G M"^"-, we denote 
its ordered sequence of eigenvalues by Ai(^) < \2{A) < . . . < A„(A) and also its /2-operator norm by 
1^112 = supy^ii^^i ll^^lb- Finally we use (• , •) to denote the Euclidean inner product. 

II. Background and problem set-up 
We begin in this section by introducing necessary background and setting up the problem more precisely. 

A. Network-constrained averaging 

Consider a collection {0i(O), i = 1, . . . , n} of n numbers. In statistical settings, these numbers would 
be modeled as identically distributed (i.i.d.) draws from an unknown distribution Q with mean /i. In a 
centralized setting, a standard estimator for the mean is the sample average 9 := ^ X]i"=i When all 

of the data can be aggregated at a central location, then computation of is straightforward. In this paper, 
we consider the network-constrained version of this estimation problem, modeled by an undirected graph 
Q = (V, E) that consists of a vertex set V = {1, ... , n}, and a collection of edges £ joining pairs of vertices. 
For i G V, we view each measurement ^i(O) as associated with vertex i. (For instance, in the context of 
sensor networks, each vertex would contain a mote and collect observations of the environment.) The edge 
structure of the graph enforces communication constraints on the processing: in particular, the presence of 
edge (i, j) indicates that it is possible for sensors i and j to exchange information via a noisy communication 
channel. Conversely, sensor pairs that are not joined by an edge are not permitted to communicate directly]^ 
Every node has a synchronized internal clock, and acts at discrete times t = 1, 2, • • • . For any given pair of 

^Moreover, since the edges are undirected, there is no difference between edge (i, j) and (j, i); moreover, we exclude self-edges, 
meaning that (i, i) ^ £■ for all i £ V. 
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sensors G £, we assume that the message sent from i to j is perturbed by an independent identically 
distributed iV(0,(7^) variate. Although this additive white Gaussian noise (AWGN) model is more realistic 
than a noiseless model, it is conceivable (as pointed out by one of the reviewers) that other stochastic channel 
models might be more suitable for certain types of sensor networks, and we leave this exploration for future 
research. 

Given this set-up, of interest to us are stochastic algorithms that generate sequences {6'(t)}^o of iterates 
contained within W^, and we require that the algorithm be graph-respecting, meaning that in each iteration, 
it is allowed to send at most one message for each direction of every edge G £■ At time t, we measure 
the distance between 9{t) and the desired average 9 via the average (per node) mean-squared error, given 
by 

1 " 

MSE(0(t)) := - VE[(0,(t) - ef]. (1) 
n ^-^ 

1=1 

In this paper, our goal is for every node to compute the average 9 up to an error tolerance 6. In addition, 
we require almost sure consensus among nodes, meaning 

¥[9i{t) = 9j{t) y i,j = 1,2,- ■■ ,n]^l as t ^ oo. 

Our primary goal is in characterizing the rate of convergence as a function of the graph topology and the 
number of nodes, to which we refer as the network-scaling function of the algorithm. More precisely, in 
order to study this network scaling, we consider sequences of graphs {Q„} indexed by the number of nodes 
n. For any given algorithm (defined for each graph Qn) and a fixed tolerance parameter 6 > 0, our goal is 
to determine bounds on the quantity 

Tg{n;6) :=mi{t = l,2,... \ MSE{9{t)) < 6} . (2) 

Note that Tg{n; 6) is a stopping time, given by the smallest number of iterations required to obtain mean- 
squared error less than 6 on a graph of type Q with n nodes. 

B. Graph topologies 

Of course, the question that we have posed will depend on the graph type, and this paper analyzes three 
types of graphs, as shown in Figure [T] The first two graphs have regular topologies: the single cycle graph 
in panel (a) is degree two-regular, and the two-dimensional grid graph in panel (b) is degree four-regular. 
In addition, we also analyze an important class of random graphs with irregular topology, namely the class 
of random geometric graphs. As illustrated in Figure [lie), a random geometric graph (RGG) in the plane 
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(a) Single cycle. 



(b) Two-dimensional grid. 



(c) Random geometric graph. 



Fig. 1. Illustration of graph topologies, (a) A single cycle graph, (b) Two-dimensional grid with four-nearest- 
neighbor connectivity, (c) Illustration of a random geometric graph (RGG). Two nodes are connected if their 
distance is less than r{n). The solid circles represent the center of squares. 



is formed according by placing n nodes uniformly at random in the unit square [0,1] x [0,1], and the 
connecting two nodes if their Euclidean distance is less than some radius r(n). It is known that an RGG 



will be connected with high probability as long as r(n) = -^^); see Penrose [26] for discussion of 
this and other properties of random geometric graphs. 

A key graph-theoretic parameter relevant to our analysis is the graph diameter, denoted by Dn = 
diam(^„). The path distance between any pair of nodes is the length of the shortest path joining them 
in the graph, and by definition, the graph diameter is the maximum path distance taken over all node pairs 
in the graph. It is straightforward to see that Dn = G(n) for the single cycle graph, and that Dn = Q{y/n) 
for the two-dimensional grid. For a random geometric graph with radius chosen to ensure connectivity, it is 
known that D„ = 6 (^^^ 

Finally, in order to simplify the routing problem explained later, we divide the unit square into subregions 



(squares) of side length y ^ in case of grid, and for some constant c > 0, of side length y c^^^ in case 
of RGG. We assume that each node knows its location and is aware of the center of these m? subregions 
namely {xi,yj) i,j = 1,2, •• • ,m, where m = ^/n for the regular grid, and m = 



^ for the RGG. 

og n 

As a convention, we assume that (xi, yi) is the left bottom square, to which we refer to as the first square. 
By construction, in a regular grid, each square will contain one and only one node which is located at the 
center of the square. From known properties of RGGs ll26l . lUTI . each of the given subregions will contain at 
least one node with high probability (w.h.p.). Moreover, an RGG is regular w.h.p, meaning that each square 
contains (logn) nodes (see Lemma 1 in the paper |[T2l ). Accordingly, in the remainder of the paper, we 
assume without loss of generality that any given RGG is regular. Note that by construction, the transmission 
radius r{n) is selected so that each node in each square is connected to every other node in four adjacent 
squares. 
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III. Algorithm and its properties 

In this section we state our main result which is followed by a detailed description of the proposed 
algorithm. 



A. Theoretical guarantees 

Our main result guarantees the existence of a graph-respecting algorithm with desirable properties. Recall 
the definition of the graph respecting scheme, as well as the definition of our AWGN channel model given 
in Section JIl In the following statement, the quantity cq denotes a universal constant, independent of n, 6, 
and C7^. 

Theorem 1. For the communication model in which each link is an AWGN channel with variance a^, there 
is a graph-respecting algorithm such that: 

a) Nodes almost surely reach a consensus. More precisely, we have 

e{t)^ei ast^oo, (3) 

for some 6 £M. 

b) After T = Tg{n;6) iterations, the algorithm satisfy the following bounds on the MSE(0(T)).' 

i) For fixed tolerance 6 > sufficiently small, we have MSE(6'(r)) < 3(7^5 after 



T.,.(n;*)<c„„ma.{l.ogl.M™} 



iterations for a single cycle graph, 
ii) For fixed tolerance 6 > sufficiently small, we have MSE(6'(T)) = O (o'^'^) after 



^ . e^ r- fl, 1 MSE(0(O))1 
Tgridin; 6) < cq Vn max <^ - log - , '> 



iterations for the regular grid in two dimensions. 
Hi) Assume that 5 = (i^gj^yi , for some fixed 6 sufficiently small. Then we have MSE(^(T)) = O (^o'^^^ 
after 

, c-N rr. V? fl, (logn)2 MSE(6'(0)) 1 

TRGG(n; b) < CO ^ni^ognf max <^ ^ log , XX^ \ 

L 6 a^o^ ) 

iterations for a regular random geometric graph. 
Here cq is some constant independent of n, 6, and cr^, whose value may change from line to line. 
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Remarks: A few comments are in order regarding the interpretation of this result. First, it is worth mentioning 
that the quality of the different Unks does not have to be the same. Similar arguments apply to the case where 
noises have different variances. Second, although nodes almost surely reach a consensus, as guaranteed in 
part (a), this consensus value is not necessarily the same as the sample mean 9. The choice of 6 is intentional 
to emphasize this point. However, as guaranteed by part (b), this consensus value is within distance of the 
actual sample mean. Since the sample mean itself represents a noisy estimate of some underlying population 
quantity, there is little point to computing it to arbitrary accuracy. Third, it is worthwhile comparing part (b) 
with previous results on network scaling in the noisy setting. Rajagopal and Wainwright ll27l analyzed 
a simple set of damped updates, and showed that Tcyc(n;(5) = O [v?^ for the single cycle, and that 
TgridiiT-) = C {'n) for the two-dimensional grid. By comparison, the algorithm proposed here and our analysis 
thereof has removed factors of n and i/n from this scaling. 

B. Optimality of the results 

As we now discuss, the scaUngs in Theorem [T] are optimal for the cases of cycle and grid and near-optimal 
(up to logarithmic factor) for the case of RGG. In an adversarial setting, any algorithm needs at least Q.{Dn) 
iterations, where Dn denotes the graph diameter, in order to approximate the average; otherwise, some node 
will fail to have any information from some subset of other nodes (and their values can be set in a worst-case 
manner). Theorem [T] provides upper bounds on the number of iterations that, at most, are within logarithmic 
factors of the diameter, and hence are also within logarithmic factors of the optimal latency scaling law. 
For the graphs given here, the scalings are also optimal in a non-adversarial setting, in which {0j(O)}"^^ 
are modeled as chosen i.i.d. from some distribution. Indeed, for a given node j G V, and positive integer t, 
we let J\f{j;t) denote the depth t neighborhood of j, meaning the set of nodes that are connected to j by a 
path of length at most t. We then define the graph spreading function tpg{t) = miiijgv [-^(j; t)\. Note that 
the function ipg is non-decreasing, so that we may define its inverse function tpg^{s) = mf{t \ ^g(t) < s}. 
As some examples: 

• for a cycle on n nodes, we have i^git) = 2t, and hence il^g^{s) = s/2. 

• for a n-grid in two dimensions, we have the upper bound 4^g{t) < 2t^, and hence the lower bound 

i'g'is) > yf. 

• for a random geometric graph (RGG), we have the upper bound '4'g{t) = G(t^logn), which implies 
the lower bound Vg^(s) = 6 (^J^) 

After t steps, a given node can gather the information of at most '4)g{t) nodes. For the average based on 
nodes to be comparable to 9, we require that ^g(t) = VL{n), and hence the iteration number t should 
be at least ^l{'^l)g^ {n)) . For the three graphs considered here, this leads to the same conclusion, namely that 
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n{Dn) iterations are required. We note also that using information-theoretic techniques, Ayaso et al. [1] 
proved a lower bound on the number of iterations for a general graph in terms of the Cheeger constant ||9l. 
For the graphs considered here, the Cheeger constant is of the order of the diameter. 

C. Description of algorithm 

We now describe the algorithm that achieves the bounds stated in Theorem [T] At the highest level, the 
algorithm can be divided into two types of phases: an inner phase, and an outer phase. The outer phase 
produces a sequence of iterates {6{t)}, where r = 0, 1, 2, . . . is the outer time scale parameter. By design 
of the algorithm, each update of the outer parameters requires a total of M message-passing rounds (these 
rounds corresponding to the inner phase), where in each round the algorithm can pass at most two messages 
per edge (one for each direction). To put everything in a nutshell, the algorithm is based on establishing 
multiple routes, averaging along them in an inner phase and updating the estimates based on the noisy 
version of averages along routes in an outer phase. Consequently, if we use the estimate 9{t), then in the 
language of Theorem [T] it corresponds to T = Mr rounds of message-passing. Our goal is to establish upper 
bounds on T that guarantee the MSB is 0{(j'^6). Figure |2] illustrates the basic operations of the algorithm. 

Two-phase algorithm for distributed consensus: 

• Inner phase: 

- Deciding the averaging direction 

- Choosing the head nodes 

- Establishing the routes 

- Averaging along the routes 

• Outer phase: 

- Based on the averages along the routes, update the estimates according to 

e{T + 1) = e{r) - e{r){L{r)e{T) + v{t)} 

Fig. 2: Basic operations of a two-phase algorithm for distributed consensus. 

1 ) Outer phase: In the outer phase, we produce a sequence of iterates {^(t)}!^^ according to the recursive 
update 

e{T + 1) = e{T) - e{T){L{T)e{T) + v{t)}. (4) 

Here {e(T)}^^ is a sequence of positive decreasing stepsizes. For a given precision, 5, we set e(r) = 
+ r). For each r, the quantity L{t) e M"^" is a random matrix, whose structure is determined by the 
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inner phase, and v{t) E M" is an additive Gaussian term, whose structure is also determined in the inner 
phase. As will become clear in the sequel, even though L and v are dependent, they are both independent 
of 9. Moreover, given L, the random vector v is Gaussian with bounded variance. 

2) Inner phase: The inner phase is the core of the algorithm and it involves a number of steps, as we 
describe here. We use s = 1, 2, . . . , M to index the iterations within any inner phase, and use {7(5)}^]^ 
to denote the sequence of inner iterates within M". For the inner phase corresponding to outer update 
from 9{t) — )• 9{t + 1), the inner phase takes the initialization 7(1) ^ 9{t), and then reduces as output 
7(M) — )• 9{t + 1) to the outer iteration. In more detail, the inner phase can be broken down into three steps, 
which we now describe in detail. 

a) Step 1, deciding the averaging direction: The first step is to choose a direction in which to perform 
averaging. In a single cycle graph, since left and right are viewed as the same, there is only one choice, and 
hence nothing to be decided. In contrast, the grid or RGG graphs require a decision-making phase, which 
proceeds as follows. One node in the first (bottom left) square, wakes up and chooses uniformly at random 
to send in the horizontal or vertical direction. We code this decision using the random variable C, G {—1,1}, 
where C, = —1 (respectively C = +1) represents the horizontal (respectively vertical) direction. To simplify 
matters, we assume in the remainder of this description that the averaging direction is horizontal, with the 
modifications required for vertical averaging being standard. 

b) Step 2, choosing the head nodes: This step applies only to the grid and RGG graphs. Given our 
assumption that the node in the first square has chosen the horizontal direction, it then passes a token message 
to a randomly selected node in the above adjacent square. The purpose of this token is to determine which 
node (referred to as the head node) should be involved in establishing the route passing through the given 
square. After receiving the token, the receiving node passes it to another randomly selected node in the above 
adjacent square and so on. Note that in the special case of grid, there is only one node in each square, and so 
no choices are required within squares. After m rounds, one node in each square (xi, yj), j = 1, 2, • • • ,m 
({xi,yi),i = 1,2,- •• ,m) receives the token, as illustrated in Figure |3] Note that again in a single cycle 
graph, there is nothing to be decided, since the direction and head nodes are all determined. 

c) Step 3, establishing routes and averaging: In this phase, each of head nodes establishes a horizontal 
path, and then perform averaging along the path, as illustrated in Figure [Sfb). This part of algorithm involves 
three substeps, which we now describe in detail. 

• For J = 1, 2, • • • , m, each head node sij selects a node S2j uniformly at random (u.a.r.) from within the 
right adjacent square, and passes to it the quantity 7ij (l). Given the Gaussian noise model, node S2j then 
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Fig. 3. (a) The node labeled sn in the first square, chooses the horizontal direction for averaging = —1); it 
passes the token vertically to inform other nodes to average horizontally. Nodes who receive the token pass it 
to another node in the above adjacent square, (b) The head nodes sij j = 1, 2, • • • , as determined in the first 
step, estabUsh routes horizontally (Vj, j = 1,2, - ■ ■ , to) and then average along these paths. 



receives the quantity 

71^(1) = 7ij(l) + vij, where vij ~ N{0,a'^), 

and then updates its own local variable as 72j(2) = 72j(l) + 7ii(l)- We then iterate this same procedure — 
that is, node S2j selects another s^j u.a.r. from its right adjacent square, and passes the message 72j(2). 
Overall, at round i of this update procedure, we have 

7(i+i)i(^ + 1) = 7(i+i)j(0 +7iiW, 

where jijii) = 7ij(«) + %, and Vij ~ A'"(0, a^). At the end of round m, node Smj can compute a noisy 
version of the average along the path Vj : sij S2j ^ ■ ■ ■ ^ Smj, in particular via the rescaled quantity 

^^Tn^^l^^ J = l,2,...,m. 

mm ' 
1=1 

Here the variable Vj ~ AA(0, ^), since the noise variables associated with different edges are independent. 
• At this point, for each j = 1,2, ... ,m, each node s^j which has the noisy version, rjj, of the path 
average along route Vj; can share this information with other nodes in the path by sending rjj back to 
the head node. A naive way to do this is as follows: node Smj makes m copies of rjj — ^namely, r/j'^ = rjj, 
Z = 1, 2, • • • , m — and starts transmitting one copy at a time back to the head node. Nodes along the path 
simply forward what they receive, so that after m — i + m — I time steps, node s.^ receives m noisy 
copies of the average, rjfj = -q^P +vf^ where vf^ ^ AA(0, (m — i)cr^). Averaging the m copies, node Sij 
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can compute the quantity 

7ii - ^ - 1) - - = ^^-^ + ^^i' 



(=1 z=i 



where = j + :^ SIHi Since the noise on different links and different time steps are independent 



Gaussian random variables, we have wij ~ A/'(0, o\), with 

cjj = — a + ( i jcj = (i jcT < cr . 

mm m 

Therefore, at the end of M = 0(m) rounds, for each j = 1, 2, . . . , m, all nodes have the average of the 
estimates in the path Vj that is perturbed by Gaussian noise with variance at most . Since m = 9(D„), 
we have M = e(D„). 

• At the end of the inner phase r, nodes that were involved in a path use their estimate of the average 
along the path to update 6{t), while estimate of the nodes that were not involved in any route remain the 
same. A given node Sij on a path updates its estimate via 

e^^^ir + 1) = {1 - e'{T)}esjT) + 6'(r)7,,(M), (5) 

where €'{t) = O ^ ^_|_\/^ ^ ■ On the other hand, using (• , •) to denote the Euclidean inner product, we have 
7jj(M) = {w , 6{t)) + Vs,j, where w is the averaging vector of the route Vj with the entries w{sij) = ^ 
for £ = 1, 2, • • • , m, and zero otherwise. Combining the scalar updates (|5j yields the matrix-form update 

e{T + 1) = e{T) - e'(r){(/ - W{t))6{t) + v'{t)}, (6) 

where the matrix W{t) = W{t; Vi,V2,- " i 'Pm, C) is a random averaging matrix induced by the choice 
of routes Vi,V2,-" j'Pm and the random directions The noise vector v'{t) ~ M{0,C') is additive 
noise. Note that for any given time, the noise at different nodes are correlated via the matrix C", but for 
different time instants r 7^ r', the noise vectors v'{t) and v'{t') are independent. Moreover, from our 
earlier arguments, we have the upper bound max C^'j < cr^. 

i=l,...,n 

IV. Proof OF Theorem [T] 

We now turn to the proof of Theorem [1] At a high-level, the structure of the argument consists of 
decomposing the vector 9{t) G M" into a sum of two terms: a component within the consensus subspace 
(meaning all values of the vector are identical), and a component in the orthogonal complement. Using this 
decomposition, the mean-squared error splits into a sum of two terms and we use standard techniques to 
bound them. As will be shown, these bounds depend on the parameter 6, noise variance, the initial MSE, 
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and finally the (inverse) spectral gap of the update matrix. The final step is to lower bound the spectral gap 
of our update matrix. 



A. Setting up the proof 

Recalling the averaging matrix W{t) from the update dS]), we define the Laplacian matrix 5(r) := I — W{t) 
We then define the average matrix := E [M/^(t)], where the expectation is taken place over the randomness 
due to the choice of routes]^ in a similar way, we define the associated (average) Laplacian S := I — W. 
Finally, we define the rescaled quantities 

e(r) := A2(5) 6'(r), L{t) := S{t), and v{t) := v' (r), (7) 

where we recall that A2(-) denotes the second smallest eigenvalue of a symmetric matrix. In terms of these 
rescaled quantities, our algorithm has the form 

e{T + 1) = e{T) - e{T)[L{T)e{T) + v{t)i (8) 

as stated previously in the update equation (|4]i. Moreover, by construction, we have v{t) ~ A/'(0, C) where 
C = /X isw^ C ■ We also, for theoretical convenience, set 

A2(5)(t+ 

or equivalently e(r) = (^^ry for r = 1, 2, • • • . 

We first claim that the matrix W is symmetric and (doubly) stochastic. The symmetry follows from the 
fact that different routes do not collide, whereas the matrix is stochastic because every row of W (depending 
on whether the node corresponding to that row participates in a route or not) either represents an averaging 
along a route or is the corresponding row of the identity matrix. Consequently, we can interpret W as 
the transition matrix of a reversible Markov chain. It is an irreducible Markov chain, because within any 
updating round, there is a positive chance of averaging nodes that are in the same column or row, which 
implies that the associated Markov chain can transition from one state to any other in at most two steps. 
Moreover, the stationary distribution of the chain is uniform (i.e., vr = 1/n). 

We now use these properties to simplify our study of the sequence {6{t)}^^i generated by the up- 
date equation ([8]l. Since S is real and symmetric, it has the eigenvalue decomposition 5 = UAU^, 



where U 



Ul U2 • • • Un 



is a unitary matrix (that is, U'^U = In)- Moreover, we have A 



For the single cycle graph, there is only one route that involves all the nodes at each round, so W{t) is deterministic in this 
case. 
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diag{Ai(S'), A2(5'), • • • ,A„(5)}, where Xi{S) is the eigenvalue corresponding to the eigenvector Uj, for 

i = 1, . . . ,n. Since L = — - — (/ — W), the eigenvalues of L and W are related via 

MS) 

^ (l_A„+i_,(t^)). 



l-Xn^l{W) 

Since the largest eigenvalue of an irreducible Markov chain is one (with multiplicity one) |[T6l . we have 

1 = A„(l?) > \n-i{W) > ■ ■ ■> XiiW), or equivalently 

= Ai(L)<A2(L)<---<A„(Z), 

with A2(L) = 1. Moreover, we have S^l = LI = 0, so that the first eigenvector ui = l/\/n corresponds 
to the eigenvalue Ai(L) = 0. Let U denote the matrix obtained from U by deleting its first column, ui. 
Since the smallest eigenvalue of L is zero, we may write L = UAU'^, where A = diag{A2(L), • • • Xn{L)}, 
U^U = In-i, and UU^ = In — With this notation, our analysis is based on the decomposition 

e{T) = a{T)^ + Up{T), (10) 
\/n 



where we have defined a(r) := (1/V^, 6I(t)) G M and /3(r) := U'^9{t) e M"-^ Since 1^L(t) = O'^ for 
all r = 1,2, from the decomposition (ITOl) and the form of the updates (|8), we have the following 
recursions, 

a(r + 1) = a(r) — e(r)^^i;(r), and (11) 



/3(t + 1) = /3(r) - e(T) {L{t)(3{t) + U^v{t)) . (12) 
Here L is an (n — 1) x (n — 1) matrix defined by the relation 

(F 



U'L{t)U 



L(r) 



B. Main steps 

As we show, part (a) of the theorem requires some intermediate results of the proof of part (b). Accordingly, 
we defer it to the end of the section. With this set-up, we now state the two main technical lemmas that 
form the core of Theorem [T] Our first lemma concerns the behavior of the component sequences {a(T)}^Q 
and {/3(r)};j^Q which evolve according to equations ([TTI i and ([T2] | respectively. 
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Lemma 2. Given the random sequence {^(t)} generated by the update equation we have 

MSE(0(t)) = - var (a(r)) + - E[||/3(t)||2] . (13) 
n n 

^ V ' V ' 

ei(T) e2(T) 

Furthermore, ei(r) and 62 (t) satisfy the following bounds: 

(a) For each iteration r = 1, 2, . . ., we have 

(b) Moreover, for each iteration t = 1,2, ... we have 

[MS)? r + 1-1 + 



e2(r) < + 62(0) (15) 



From Lemma m we conclude that in order to guarantee an 0( ,"'^L,A bound on the MSB, it suffices to 
take T such that 

^TI^ - e2(0)[A2(S)]2' r + i-1 

Note that the first inequality is satisfied when r > [A2(5')]^. Moreover, doing a little bit of algebra, 
one can see that t = | log | — — 1) is sufficient to satisfy the second inequality. Accordingly, we take 

r-max|^iog^, ^^^^ 
outer iterations. 

The last part of the proof is to bound the second smallest eigenvalue of the Laplacian matrix S. The 
following lemma, which we prove in Section ITV-Dl to follow, addresses this issue. Recall that A2(-) denotes 
the second smallest eigenvalue of a matrix. 

Lemma 3. The averaged matrix S that arises from our protocol has the following properties: 

(a) For a cycle and a regular grid we have \2{S) = $^(1), and 

(b) for a random geometric graph, we have X2{S) = ^{i^^)> with high probability. 

It is important to note that the averaged matrix S is not the same as the graph Laplacian that would arise from 
standard averaging on these graphs. Rather, as a consequence of establishing many paths and averaging along 
them in each inner phase, our protocol ensures that the matrix behaves essentially like the graph Laplacian 
for the fully connected graph. 

As established previously, each outer step requires M = 0{Dn) iterations. Therefore, we have shown 
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that it is sufficient to take a total of 



r = 0^Z)„max|-log-, 

transmissions per edge in order to guarantee a ,^ ""Ifi^ bound on the MSE. As we will see in the next section, 
assuming that the initial values are fixed, we have ei(0) = 0, hence MSE(0(O)) = 62(0). The claims in 
Theorem [T] then follow by standard calculations of the diameters of the various graphs and the result of the 
Lemma [3] 

It remains to prove the two technical results. Lemma |2] and |3] and we do so in the following sections. 



C. Proof of Lemma\2\ 
We begin by observing that 



E 



{9{t) - eimr) - 0lf 



Fi + F2 + F3, 



where Fi := E [(a(r) - ^/nef] if, the second term is given by F2 := E [c//3(r)/3(r)^f7 



, and 



:=E 



(Q(r) - ^/^e) (3{T fU^ 
'n 



+ E 



(a(T) - ^/^9) UI3{t) 



Since U has orthonormal columns, all orthogonal to the all one vector {l^U = 0), it follows that trace(i*2) = E [||/3(r) II2], 
and trace (F3) = 0. 

It remains to compute trace(Fi). Unwrapping the recursion (fTTT i and using the fact that initialization 0(0) 
implies a(0) = ^/nO yields 

r-l - 

(r) = V^0"- J]e(0(^,^;(/)), (16) 



a 



for all r = 1, 2, . . .. Since v(l), ^ = 0, 1, • • • , r — 1, are zero mean random vectors, from equation (IT6) we 
conclude that E[a(r)] = -^/n^^ and accordingly, trace(Fi) = var (a(r)). Recalling the definition of the 
MSE ([T]) and combining the pieces yields the claim (fT3] ). 

(a) From equation ([T6] l. it is clear that each a(r) is Gaussian with mean ^JnQ. It remains to bound the 



*Here we have assumed that the initial values, Si(0) i = 1, 2, • ■ • , n, are given (fixed). 
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variance. Using the i.i.d. nature of the sequence v{i) ~ A/'(0, C), we have 



var (a( 



T-l 



1=0 

T-l 



z=o 



where we have recalled the rescaled quantities (|7}. Recalling the fact that C'^^ < cr^ and using the Cauchy- 



Schwarz inequality, we have C-j < ^jC'-C'-- < o"^. Hence, we obtain 



T-l 



var (a(T)) < na^ Xl^'(^)' 



1=0 

2 "^-1 



< 



[A2(5)P^(i + 0^ 

-2 /-^ 1 _ ^^2^ 



710" 



1 



from which rescaling by 1/n establishes the bound ([T3 



(b) Defining H{/3{t),v{t)) = L{t)I3{t) + U'^v{t), the update equation ([T2l) can be written as 

/3(r + l) = /3(t) - eiT)H{f3{T),viT)), 

for r = 1, 2, • • • . In order to upper bound e2(T + 1), defined in (fT3] ). we need to control e2(T + 1) — e2(T) 
Doing some algebra yields 

e2(r + 1) - e2(r) = i E [(/3(t + 1) - /3(r) , /3(r + 1) + /3(r))] 

= i E [(-6 {T)Hi(3{T, vir))) , -6 (r)i7(/3(r, v{t))) + 2/3(t))] , 



and hence 



e2(r + l)-e2(r) = i ^(t)^ E [||F(/3(r), t;(r))||2] - ^^^^^ 



n 



-Ki?(/3(r),7;(r)),/3(r))]. 



Since /3(t) is independent of both L(r) and v{t), by conditioning on the /3(r) and using the tower property 
of expectation, we obtain 



E [{H{/3iT),viT)) , /3(t))] = E [(E [L] /3(r) , /3(t))] . 
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By construction all the eigenvalues of E [L] are greater than one, hence 



(E[L]/3(r),/3(r)) > ||/3(r) 



Putting the pieces together, we obtain 
1 



e2(r + l) < -e(r)2E[||i7(/3(r),^(r))||2] + [l - 2e{T)) e,{T) 

'tT„ 



eiry E ||L(r)/3(r)|| + - e(r)^ E ^;(r)|| + (1 - 26(t)) esW, (17) 
n ^ '- ✓ n 



where we used the fact that E 

Fi =E[||L(r)/3(T)||i], and F2 = E WU'virj,^ 
operator norm, we have 



{L{t)P{t) , U'^v{t)) = 0. We continue by upper bounding the terms 



First, we bound the former By definition of the I2- 



E[||L(r)/3(T)||i] < En||L(r)|||i||/3(T)||i]. 



On the other hand, using the fact that L{t) = t-tcT ~ ^i'^)) ^ (recall the identities of the Section 



HV-All yield; 



I 



A2(5) 



^(r)|||2 < >^(l + li^Wli2) 



A2(S)' 



Therefore, we have the following bound on Fi 



Fi < = — 



Turning to term F2, we have 

F2 = E 



n 



< trace ( cov(t>(T))) < 



na 



(18) 



(19) 



Substituting the inequalities (fTSl) and ( fT9b into (fTTl) . we obtain the following recursive bound on e2(r + 1) 



e2(r + l) < 



e(r)2 + 1 - 2e(T) + 



62 (r). 



'Let V be an eigenvector of the matrix W{t) corresponding to the eigenvalue A 7^ 1. Since = 0, there exist an (n — 1)- 
dimensional vector it such that v = Uu. Therefore we have, 

U^{I - W{t))Uu = U^{I - W{t))v = (1 - X)U^v = (1 - X)u. 

So by subtracting one from the eigenvalues of [7^(1 — W{t))U, we obtain the non-one eigenvalues of W{t). 
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Recall the definitions © and If 6 < Mil!, then 1 - 2e(r) + < 1 - e(r), and hence we have 

2 

e2(r + 1) < TT^eirf + (1 - 6(T))e2(T), (20) 
for all T = 1, 2, • • • . Unwrapping the inequality (l20l ) yields 

2 T T r 

e2(r + 1) < J] e(A:)2 [] (1 - + 11(1 - e{l)) eM- (21) 

fe^o l=k+l 1=0 

k+- 

On the other hand, the product n/=A:+i(l ~ ^(0) forms a telescopic series and is equal to — -f - Substituting 



this fact into the equation (1211) yields 
where step (a) uses the following inequality 



valid for S e (0, i). 



D. Proof of Lemma \3\ 

In the case of cycle there is only one averaging path and all the nodes are involved in that at each round 
so the averaging matrix, W, is fixed. More precisely, we have W = W = ^U'^. Therefore, W is a rank 1 
matrix with A„_i(VF) = and accordingly we have X2{S) = 1 — Xn-i{W) = 1. 

For the case of grid or random geometric graphs, we use the Poincare inequality ifTTI . A version of this 
theorem can be stated as follows: Let A = [uij] denote the transition matrix of an irreducible aperiodic 
time reversible Markov chain with stationary distribution vr. For each ordered pair of nodes {s,u) in the 
transition diagram, choose one and only one path r]su = (s, si, S2, • " ' j si,u) between s and u and define 

\vsu\-=—ri — + -r4 — + "'^~r~\ — • 

Then the Poincare coefficient is 



K := max N |rysu|7r(s)7r(n), (23) 
where E' is the set of directed edges formed in the previous step. Defining this quantity, the theorem states 
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that Xn-i{A) < 1 — or equivalently, 



l-Xn-iiA)>-. (24) 



We apply this theorem to the Markov chain formed by W; the idea is to upper bound its Poincare coefficient. 

1 ) Grid: We first define a path r]su for every pair of nodes {s,u}. Two different cases can be distinguished 
here. For an illustration of the path r]su see Figure |4] 

a) Case 1: Nodes s and u do not belong to the same column or row. In this case, we consider a 
two-hop path rjsu = (s — )• — )• n), where w = {xmVs) is the vertex of the rectangle constructed by s and 
u. Note that Xu is the x-coordinate of u and is the y-coordinate of s. Since nodes {s, w} and {w, u] 
are averaged ^ of the time, we have Wgw = = Substituting this into (l22l ) and using the fact that 
vr = - 1 yields 

\risu\ = = 7^ + = -, = 4:17112. 

b) Case 2: Nodes s and u belong to the same row or column. In this case, we set r]su = (s — n) 
which leads to 

, , 1 

\r]su\ = = TT — 2mn. 

Moreover, a given edge e = (s — )• ty) is involved in at most m paths. As node u varies in the corresponding 
column or row, we obtain m — 1 paths in case 1, and one path in case 2. 
Combining the pieces, we compute the Poincare coefficient 



E4mn 
b?suF('S)vr(u) < m — tt- = 4. 



Finally, from equation (l24l i. we have 

X2{S) = l-Xn^l{W)>->l 

K 4 

which concludes the proof for the case of a grid-structured graph. 

2) Random geometric graph: For the RGG, we follow the same proof structure: namely, we first find a 
path for each pair of nodes {s,u], and then upper bound the Poincare coefficient for the Markov chain W . 
We first introduce some useful notation. Let C : V — )• {1, 2, • • • ,m}^ be the mapping that takes a node as 
its input and returns the sub-square of that node. More precisely, for some s G V we have 

C{s) = (i, j) if s G (i, j)-th square i, j = 1, 2, • • • , m. 
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(a) Case 1. (b) Case 2. 

Fig. 4. Illustration of the path r/su for a grid-structured graph, (a) Case 1, where nodes s and u do not belong 
to the same column or row. (b) Case 2, where nodes s and u belong to the same column or row. This choice 
of rjsu yield a tight upper bound on the Poincare coefficient. 



Furthermore, we enumerate the nodes in square C{s) = from 1 to nij where riij denotes the total 

number of nodes in C{s). We refer to the label of node s as A/'c(s)(s) where Mc(s){-) is the enumeration 
operator for the square C{s). Also let n* = miiij rijj denote the minimum number of nodes in one sub- 
square which by assumption is greater than alogn for some constant a. We split the problem into three 
different cases. Figure |5] illustrates these there different cases. 

a) Case 1: Nodes s and u do not belong to the the same column or row. In this case, a two hop path 
Vsu = (s — )• w — )• u) is considered. First, we pick C{w), the vertex of the rectangle constructed by C{s) and 
C{u) with the same x-coordinate as C{u) and the same y-coordinate as C(s). Now choose a node, w, inside 
C{w) such that 

^C(w) (w) = J\fc{s) (s) + Mc(u) (u) mod n* . (25) 

Since each square has at least n* nodes, such a choice can be made. On the other hand, since nodes in each 
square is picked uniformly at random in the averaging phase and there are at most b log n nodes in each 
square (for some constant b) we have Wsw, > 2m(feiogn)^ ' where the factor of 2 is due to the choice 
of C, the averaging direction. Substituting this inequality into (l22l) . we obtain 



\r]su\ = = TT + = '/ — 7 — 4:b'^'rnn {log n)'^ . 

Furthermore, from equation (l25i . we see that for a fixed s there are at most | nodes in the square C{u) that 
result in choosing w. Therefore, edge e : {s ^ w) is involved in at most ^(m — 1) such paths. 

b) Case 2: Nodes s and u belong to the same row or column. In this case, by setting rjsu = (s — )■ u), 
we obtain 

l^'^"! = f77 ^ , X < 2b'^mn{\ognf. 
20 




Fig. 5. Illustration of the path rjsu for the case of RGG. (a) Case 1, where nodes s and u belong to the 
sub-squares in different row and columns (b) Case 2, where nodes s and u belong to the sub-squares in the 
same row or column, (c) Case 3, nodes s and u belong to the same square. 



Note that there is only one path containing e of this type. 

c) Case 3: Nodes s and u belong to the same square, meaning C{s) = C{u). In this case a node w is 
chosen in a square adjacent to C(s) according to (1251 ) such that C{w) is to the right of C(s); unless C(s) is 
in the last column, in which case C{w) is to the left of C{s). The same argument as case 1 would give us 
a bound on \r]su\- As for the upper bound on the number of paths: the edge e : (s — ?■ tt;) is involved in at 
most I such paths. 

Combining all the pieces, we obtain 

\ilsu\ < 46^mn(logn)^ Vs,mGV, 

and 

max'S^ I{r]su 3 e} < m — + 1. 



Substituting these two inequalities into (1231) yields 

b s 46^ mn (log n)^ 



K 



< (m - + 1 



a ' rfi 



2mb 46^ mn (log n)^ 

— 2 

= ci log n 



for some constant c\ . Therefore, from Poincare Theorem, we have 

A2(^) = l-A„_i(W^)>-> ^ 



K c\ log n 
which concludes the second part of Lemma |3l 



21 



E. Proof of part (a) of Theorem [7] 

We now return to the proof of part (a) of Theorem [T] Combining equations (ITOl ) and (fT6l) yields 

e{T) = (9 - wir)) 1 + Ui3iT), (26) 

where 10(7) = Sr=o^ ^(0 ("^ 1 ''^(O)- previously established, we know that E[w{t)] = and 
var(t(;(r)) < , "",^,^.,3 for all r = 1,2,---. Therefore, invoking a result on convergence of series with 
bounded variance (Theorem 8.3 from Chapter 1 of |T4l), we have 

w{t) w as r — )• 00, (27) 

for some random variable w. Since w{t) is a sum of independent Gaussian random variables (and hence 
Gaussian), it is absolutely integrable [14|. Therefore, we have K[w] = limr~>.oo^[w{T)] = and also 
var {w) = lim^^oo var {w{t)) < p^f^- 

Now we move on to the next part of the proof, analyzing the sequence {/3(t)}^^ using techniques from 
stochastic approximation theory (e.g., see the books 11211 . |[6]|). These techniques apply to recursions that 
generate a state sequence {6'(t)}^i according to 

eit + 1) = eit) - e(t) H{eit),vit)) t = i, 2, • • • , 

where v{t) is the noise vector that models the randomness coming into play in the algorithm. The parameter 
e{t) is a positive step size, and the sequence {e(t)}^^ is required to satisfy the conditions J^t^i ^(0 = 
and Xlt^i ^(*)" < oo for some a > 1. The asymptotic behavior of these stochastic updates can be analyzed 
in terms of the ordinary differential equation (ODE) 

^ = -M7), (28) 

where h{0) := K[H{9,v)]. Under mild regularity conditions, it is known that 9{t) 7*, where 7* is the 
attractor of the ODE (l28T l. 

Recalling the update equation (fT2l) . our problem can be cast within this framework. In particular, the state 
sequence is {/3(r)}!j^]^, the noise sequence is formed by zero-mean i.i.d. random vectors, the decreasing 
sequence is e(r) = 1/(| + r), and finally H{(3,v) = (L/3 + C^v) is a linear function with h{P) = K[L\p. 
Note because we removed the zero eigenvalue from the average Laplacian matrix, the matrix E[L] has 
all positive eigenvalues, and so 7* = is the unique stable point of the linear differential equation 



22 



2^' = — E[L]7. Therefore, an application of the ODE method ETI . ||6l guarantees that 

/3(r) ^0 as T ^ oo. (29) 
Substituting the results (l27l ) and (l29l ) into equation (|26] |. we obtain 

0(r) — w)! as r ^ oo. 

In other words, nodes will almost surely reach a consensus; moreover, the consensus value, 9 = 9 — w, is 
within distance of the true sample mean. 

V. Simulation Results 

In order to demonstrate the effectiveness of the proposed algorithm, we conducted a set of simulations. 
More specifically, we apply the proposed algorithm to four nearest-neighbor square grids of different sizes. 
We initially generate the data 9i{0), i = 1, 2, • • • , n as random A^(l, 1) variables and fix them throughout 
the simulation. So for each run of the algorithm the initial data is fixed. In implementing the algorithm, we 
adopt (7^ = 1 as the channel noise variance, and we set the tolerance parameter 5 = 0.1, leading to the step 
size e(T) = j^pp- We estimated the mean-squared error, defined in equation dT), by taking the average over 
50 sample paths. As discussed in Section |llll every outer phase update requires M = O (-v/n) time steps. 

Figure |6] shows the mean-squared error versus the number of outer loop iterations; the panel contains two 
different curves, one for a graph with n = 30^ nodes, and the other for n = 50^ nodes. As expected, the 
MSE monotonically decreases as the number of iterations increases, showing convergence of the algorithm. 
More importantly, the gap between the two plots is negligible. This phenomenon, which is predicted by our 
theory, is explored further in our next set of experiments. 

In order to study the network scaling of the grid more precisely, for a given set of graph sizes, we compute 
the number of the outer iterations r = r(n, (5), such that MSE(6'(rM)) < Recall that this stopping 
time is the focus of Theorem [Hb). Figure [T] provides a box plot of this stopping time r versus the graph 
size n. Theorem [TJb) predicts that this stopping time should be inversely proportional to the spectral gap 
of the Laplacian matrix S, which for the grid scales as (in particular, see Lemma |3]l. As shown in 

Figure Ul over a range of graphs of size varying from n = 1000 to n = 10000, the stopping time is roughly 
constant (r f» 25), which is consistent with the theory. 

VI. Discussion 

In this paper, we proposed and analyzed a two-phase graph-respecting algorithm for computing averages 
in a network, where communication is modeled as an additive white Gaussian noise channel. We showed 
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Fig. 6. Mean-squared error versus the number of outer loop iterations for grids with n <E {30^, 50^} nodes. As 
expected the MSE monotonically decreases, which supports the convergence claim. 
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Fig. 7. Stopping time t = r(ri, 6) vs. the graph size n. For different graph sizes, we compute the first outer 
phase time instance, T{n,5), such that MSE(^?(rA/)) < a^5. Here we have fixed the parameters to — 1, 
and 5 = 0.1. As you can see, over a range of graphs of size varying from 1000 to 10000, this stopping time 
is roughly constant (w 25), which is consistent with the theory (Theorem [Ttb) and Lemma O. 



that it achieves consensus, and we characterized the rate of convergence as a function of the graph topology 
and graph size. For our algorithm, this network scaling is within logarithmic factors of the graph diameter, 
showing that it is near-optimal, since the graph diameter provides a lower bound for any algorithm. 

There are various issues left open in this work. First, while the AWGN model is more realistic than 
noiseless communication, many channels in wireless networks may be more complicated, for instance 
involving fading, interference and other types of memory. In principle, our algorithm could be applied 
to such channels and networks, but its behavior and associated convergence rates remain to be analyzed. In 
a separate direction, it is also worth noting that gossip-type algorithms can be used to solve more complicated 
types of problems, such as distributed optimization problems (e.g., ESl . |[28l . llT3l ). Studying the issue of 
near-optimal network scaling for such problems is also of interest. 
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